System and method for medical coding of vascular interventional radiology procedures

ABSTRACT

A system and method for identifying medical procedure codes and medical diagnosis codes from physician reports that describe a vascular interventional radiology procedure using a combination of natural language processing (NLP) and human medical coders. In one embodiment, the system and method of the present invention creates billing results, and or other documents, that are compliant with applicable legal and policy instructions from the government or a medical institution. Medical billing codes are efficiently extracted from medical reports using a. NLP engine and a graphical user interface optimized for understanding the VasIR medical procedure described in the report.

BACKGROUND OF THE INVENTION

The invention relates to a method and a system for coding medical reportdocuments and for optimizing accurate and compliant coding results,particularly in the field of vascular interventional radiology (VasIR).

In a general sense, interventional radiology concerns various imagingprocedures that require some type of penetration of the body. Thediscipline may involve a combination of surgical procedures along withthe supervision and interpretation of the results of the procedures.Typically, diagnostic radiology procedures are non-invasive, usingvarious tools to image the body for diagnosis. However, in some cases,the radiologist will need to insert a medical device into the patient'sbody to facilitate the procedure. Usually, this is done for the purposeof delivering a tracer substance, for example, a dye or a contrast agentthat helps create more detail in a medical image for a more complete, ormore accurate diagnosis. There are a variety of procedures that fallinto this category.

During various medical procedures, including VasIR procedures, a medicaltranscript is generated by one or more medical professionals. Thistranscription is typically lengthy, and has two primary components: (i)the medical procedure transcription describing the manipulation of thecatheter through the body and the injection of dye, as done by thesurgeon in the operating room; and (ii) a second transcriptioncontaining the supervision and interpretation (S&I) codes, which are aproduct of the review of the images obtained and the creation of adiagnosis as a result of this review.

In some cases a single physician performs the procedure as well as theS&I work, delivering to a medical report coder, such as CodeRyte™ onelarge transcription. In other cases, two different physicians provide atranscription, one for the procedure and one for the reading. Each ofthe catheter and the S&I procedures have a different code associatedwith it.

In VasIR procedures a small insertion is made to a vein or an artery,into which a very thin catheter is inserted. The radiologist pushes anddirects the catheter through the circulatory system to inject a contrastagent at a specific location, or to use an imaging system to visuallyexamine the inside of an artery. These procedures require specialtraining and experience, and have a high financial value to theradiologist.

For billing and medical tracking purposes, billing codes are associatedwith various medical reports. While in some cases, the process ofextracting medical codes from a medical report transcript is a simpleone, in other cases, especially in VasIR procedures, abstraction of themedical codes is very difficult and time consuming to a medical coder.To effectively and efficiently generate medical codes relating to VasIRprocedures, a human coder needs to have an understanding of vascularprocedures, vascular anatomy, and the rules of vascular medical billingas defined by the American Medical Association. the Center for Medicaidand Medicare Services, as well as related rules from other commercialpayers and state agencies. These rules are lengthy, complex, subject tochange over time, and are subject to audit and review by the Office ofInspector General in the department of Health and Human Services. Forexample, Local Medical Review Policy (LMRP) is an administrative andeducational tool to assist providers, physicians and suppliers insubmitting correct claims for payment based on medical necessity. Localpolicies outline how contractors will review claims to ensure that theymeet Medicare coverage requirements. Also, the Correct Coding Initiative(CCI) was created to promote correct coding methodologies and to controlimproper coding that leads to inappropriate payment of Part B healthinsurance claims.

From a financial point of view, it is important to associate theappropriate diagnosis code with a-specific medical procedure, becausefinancial compensation for the medical procedure will depend on thecodes entered in the billing system. From a medical point of view, usingthe appropriate medical code will also facilitate the review andtracking of a patient's medical history.

The high cost and complexity of VasIR procedures combined with thecomplexity of the rules governing the coding of the procedures, create alikelihood of human error in the coding and a need for automating thecoding of VasIR medical reports in order to efficiently provideconsistent and compliant results.

Natural Language Processing, broadly speaking, is the technology ofanalyzing language through computer software to determine the structureof a document and the facts contained in it without the need for thedocument to be organized in a specific limited vocabulary. A ‘naturallanguage’ is any of the languages naturally used by humans, i.e. not anartificial or man-made language such as a programming language. ‘NaturalLanguage Processing’ (NLP) is a convenient description for all attemptsto use computers to process natural language. The technology has beengenerally applied to language translation, voice-to-text applications,and population of databases with reduced need for human intervention.The use of NLP in the healthcare context has already been suggested. Forexample, U.S. Pat. No. 6,182,029 to Friedman teaches using an NLP systemfor extracting information from a: natural language document, the systembeing adaptable to a medical, clinical or scientific application.However, the prior art in general, and Friedman in specific, have notaddressed the need for automating the coding of VasIR medical reports.

SUMMARY OF THE INVENTION

To address the need for automating the coding of VasIR medical reports,in one embodiment of the present invention, medical billing codes areelectronically assigned to medical reports using natural languageprocessing (NLP) process. The results of the NLP process are thendisplayed on a custom graphical user interface (GUI) for review by humanmedical coders. Human coders review and approve the codes, and providefeedback to the NLP engine if connections are needed. This feedbackprocess provides training for the NLP engine, allowing it to add to itsknowledge base over time and to expand its ability to provide reliablecoding of VasIR medical reports.

The GUI interface may be customized and optimized to display, in detail,the medical report, important related data, and coding engine outputrelating to the vascular procedures. Human coders are then able toreview the results, modify them, add to them and/or approve them throughthe GUI interface. In addition to providing the ability to review themedical report, the interface may list the facts identified by the NLPengine, and display a graphical vascular anatomy diagram that acts as areference for reviewing and updating the billing codes. The GUIinterface may also highlight the anatomical path of the VasIR procedure,providing a visual representation of any coding error caused by a breakin the path. Such interface may advantageously assist a non-physicianhuman coder in understanding the medical report. Additionally, theinterface may also allow the coder to enter data using anatomicalreferences without the need to memorize the billing codes, and may alsocalculate the codes appropriate for the anatomy and the billing rules.The custom GUI interface may facilitate significant savings in codingtime, and may be utilized to facilitate more consistent and compliantcoding results from the NLP engine and the human coder.

While the following disclosure focuses on coding VasIR procedures, theinvention disclosed herein may be applicable and adaptable to variousmedical fields and clinical or scientific applications.

DESCRIPTION OF THE DRAWINGS

While the specification concludes with claims particularly pointing outand distinctly claiming the present invention, it is believed the samewill be better understood from the following description taken inconjunction with the accompanying drawings, which illustrate, in anon-limiting fashion, the best mode presently contemplated for carryingout the present invention, and in which like reference numeralsdesignate like parts throughout the figures wherein:

FIG. 1 is a flowchart of a process of extracting billing codes usingnatural language processing.

FIG. 2 demonstrates an input table that allows human coders to makecorrections to, and add, procedure codes.

FIG. 3 shows an interface for checking the validity of the codesentered.

FIG. 4 demonstrates the highlighting mechanism of the interface.

FIG. 5(A) shows the portion of a user interface for adding a procedureor vessel.

FIG. 5(B) shows the portion of a user interface that displays a list ofvalid procedures and vessels in the database.

FIG. 5(C) shows the portion of a user interface that chooses andhighlights a valid procedures and vessels in the database according totext entered by the user.

FIG. 6(A) shows a vascular diagram with vessel labels that adjustsappropriately to the gender of the patient.

FIG. 6(B) shows a vascular diagram with vessel highlighting based oncatheter placement.

FIG. 6(C) shows a vascular diagram with vessel highlighting when a‘procedure row’ option is selected.

DETAILED DESCRIPTION OF THE INVENTION

One of the features of one embodiment of the present invention is asystem and method for predicting medical diagnosis codes based on themetadata and language information present in a medical chart. Manycomputer systems can be viewed as making predictions based on a model.For example, speech recognition predicts what words were said based onthe acoustic signal and the context, and fraud detection systems predictwhether a transaction requires investigation based on features includingthe transaction type, amount, past history, etc. In one embodiment ofthe present invention, the prediction engine makes predictions (e.g., ofprocedure and diagnosis codes) for a chart based on the metadata andlanguage information present in the chart.

As shown in FIG. 1, according to one embodiment of the presentinvention, natural language processing (NLP) is used to extract billingcodes by: (i) regioning the report into sections; (ii) detecting,parsing, and normalizing medical terminology and facts relevant tovascular IR; (iii) detecting the laterality of the vessel or procedurecomponent: (iv) determining negation or other non-relevance of termusage; (v) determining the catheterization insertion point(s)and thedirectional approach; (vi) determining the sequence and path of vesselscatheterized during the procedure; (vii) detecting abnormal vascularanatomy and adjusting the analysis appropriately; (viii) determining thephysical areas imaged and interpreted in the procedure, the viewdetails, and the relevant sequencing of these images; (ix) determiningthe appropriate billing or medical codes for the procedure based on therules of vascular IR coding; (x) ordering and filtering the identifiedbilling codes based on the catheterization paths employed in theprocedure; (xi) adjusting the billing codes as appropriate based on themedical order of the procedure; (xii) filtering identified billing codesthat are not appropriate to bill based on the medical details of theprocedure; (xiii) linking anatomical terms seen in the procedure tobilling codes, and to a graphical depiction of the procedure to aid incoding accuracy; (xiv) for identifying and classifying other vascular IRprocedures and therapies; (xv) identifying procedures which may not bepart of avascular IR procedure, but which may have occurred during thephysician's examination and which would need to be reported to a medicalbilling application; (xvi) identifying medical diagnosis code(s)associated with the procedure(s) that took place; (xvii) passing anintermediate data structure to the user interface identifying specificlocations in the medical report where facts were located as offsets oftext to drive text highlighting in the user interface; and (viii)identifying the confidence in the billing codes generated by the NLPprocessing for the text examined, and reporting that confidence to thehuman coder.

Step A in FIG. 1 concerns regioning the report into sections. In orderto interpret the medical language in a vascular IR report correctly, themajor functional sections of the narrative that each sentence and phrasein the report corresponds to are determined. These functional sectionsor text regions typically include the following: (a) Catheterizationdescription (CATH)—including the catheter insertion, vessel traversaland removal, with details of events and observations taking place inthis process; (b) imaging interpretation (S&I)—including medicalanalysis and interpretation of the radiological images obtained duringthe procedure: (c) Therapy description (THER)—including a description oftherapies performed during the vascular IR procedure, for example,angioplasties (ANGP), embolizations (EMBOL), and the other major andminor types of therapies performed during vascular IR procedures; (d)Clinical indication or history (HIST)—including the presenting symptomsand preliminary diagnoses motivating the procedure; (e) Final Impression(IMPR)—including the final diagnostic impression of the patient'smedical findings based on the results of the procedure(s); (f) Otherdictation (OTHER)—including other dictated information not correspondingto the above functional sections; and (g) Non-dictated information(NONDICT)—including sections of the medical report not dictated by thephysician, such as patient demographic information, transcriptiondetails, and physician practice details.

Each sentence within the report is assigned to one or more of the abovefunctional regions, and for sentences containing multiple functionalregions; each phrase (or subsequence of words) within the sentence isassigned to a single most-appropriate functional region. In each case,the approximate probability that the sentence/phrase belongs to theassigned functional region is also returned.

The regioning process consists of two phases: (1) a training phase, and(2) a runtime analysis phase. In the training phase, a set of medicalreports with each sentence or phrase pre-annotated by functional regionis statistically analyzed and the conditional probabilityP(region_i|phrase_j) and P(region_i|word_j) are computed for eachobserved phrase and word in the training data. Likewise, the transitionprobability of P(region_J|region_i) between adjacent sentences, phrasesand words is computed by direct observation in the training data Atruntime, the probability P(region_i|sentence_j) andP(region_i|phrase_j)for all region_i is computed via the training-timeestimates of P(region_i|phrase_j), P(region_i|word_j) andP(region_j|region_i) using a Markov model. In parallel with the Markovmodels, regions containing strongly indicative phrases (whereP(region_i|phrase_j) is high) are classified independently as region_iin cases where P(region_i|phrase_j) exceeds a threshold (tuned onheld-out data) for one or more phrase_j in the sentence.

It will be understood that electronic data from the reports can becollected and fed into the coding system in a variety of ways, forexample, through a network connection from the report originator to thecoding entity. It should also be understood that information or datarelating to a medical report or set of medical reports may be obtainedfrom multiple data feeds and merged as denied to provide the codingentity with additional information concerning the patient, procedure orother related data.

Step B in FIG. 1 concerns detecting, parsing and normalizing medicalterminology and facts relevant to vascular IR. The medical terminologyrelevant to vascular IR often occurs in multiple variant forms that needto be normalized and disambiguated. To accomplish this, a large databaseof vascular IR terms is created, particularly including all known humanblood vessels (arteries and veins), such as theSUPERIOR_MESENTERIC_ARTERY. Second, a list of possible variantdescriptions of terms, mapped to their standard term names (e.g.“SMA”=UPERIOR_MESENTERIC_ARTERY), it created. Described below is howthis variant term list and the map to standardized forms are created andaugmented semi-automatically from human coder feedback via the vascularIR workspace. An ensemble of regular expression recognition machines isthen created for all standard and variant terms in this specializeddatabase created for this purpose. At run time, all matching standardand variant forms of vascular IR terminology are located and annotatedwith the standard term name corresponding to each matched span of text.In the cases where there are multiple terms matching in a given span oftext, a combination of longest string match, local word association anddiscourse state are used to prevent ambiguity. The primary use of“discourse state” is to disambiguate veins from arteries, a systematicambiguity encountered when neither “artery” nor “vein” is explicitlystated for the vessel (by far the most common situation). Thus, the term“the superior mesenteric” without explicit mention of artery or vein isresolved between SUPERIOR_MESENTERIC_ARTERY and SUPERIOR_MESENTERIC_VEINby the discourse state feature of whether the arterial system or venoussystem is currently being traversed at the point of discussion. In theabsence of disambiguating information, the standard term candidate withthe highest Bayesian prior probability is preferred. The variant surfaceterminology patterns matched in the text need not be contiguous and maycontain wildcards and structured subfields, such as number ranges, aspart of the matching term pattern. The patterns may also be based on amorphological analysis of the term components to their uninflected lemmaform. All relevant medical terminology, including basic proceduredescriptions and components are handled in this common phrasal termmatching and parsing framework.

Step C in FIG. 1 concerns detecting the laterality of the vessel orprocedure component. In essentially all cases, the laterality (left,right, bilateral or none) of the dictated vessel or procedure componentis a necessary distinction, yet the laterality information is often notstated explicitly or contiguously. A specialized analysis machine iscreated to determine the intended laterality of all salient vascular IRterminology, including vessels and procedure components. The lateralityassignment of each such target term is based on the following features,in order of precedence: (1) an explicit and directly adjacentpre-modifying mention of laterality (e.g. “left common carotid”), (2) adirect but non-pre-modifying-contiguous syntactic relationship extractedby syntactic parsing (e.g., “the X on the left side”), (3) potentiallyambiguous laterality description via. pre-modifying conjunction (e.g.“the left common carotid and internal carotid”), (4) the consensuslaterality of the current sentence (all laterality mentions agree) if atleast one laterality mention is before the target term, (5) the nearestlaterality term in the same sentence before the target term, (6) theconsensus laterality of the current sentence if the only lateralitymention follows the target term (7) the nearest laterality term in thesame sentence following the target term, (8) the consensus laterality inthe preceding sentence (if all laterality terms in the previous sentenceindicate the same laterality), (9) the latest mention of laterality inthe preceding sentence if multiple literalities present, (10) theconsensus laterality of the nearest preceding sentence containing alaterality term, and (11) other document-level laterality information.

Abbreviations and other variant laterality indicators are permitted anddetected, including “right”, “rgt”,. “rt”, “r”, etc., and disambiguatedappropriately. A confidence score is associated with each lateralityassignment based on the best present laterality feature in theprecedence sequence above and the statistical likelihood of accuracy ofeach of these features (independently and in combination) based on theircorrelation with truth in laterality-annotated training data. Also,stand-alone laterality classification mechanisms have been developed andutilized for each of the above laterality features independently and incombination.

Step D in FIG. 1 concerns determining negation or other non-relevance ofterm usage. The determination of correct billing codes and other medicalinformation extraction applications requires that linguistic evidence beidentified (when present) indicating the negation/absence of some actionor finding, or the other non-relevance of some action or finding. Amechanism is created to detect and exclude such contexts by using a setof both hand-crafted regular expression machines and syntactic phrasalcontexts indicating negation/non-relevance, and statistical associationsof such phrases and regular-expression-machine matches withnegation/non-relevance as statistically observed with confidence in anannotated training data set. Negated/non-relevant phrasal contexts areidentified and demarcated using the longest-matching expressions in thisterminological inventory specialized and crafted for this purpose,and/or the highest confidence matching expressions. Types ofnegated/non-relevant context detectors that have been instantiated bythis general mechanism include, but are not limited to, detectors ofactions/findings that are either: (1) explicitly not done/not found, (2)done at a future time, (3) done (or observed) at a time previous to thedictated encounter, (4) started but not completed during the dictatedencounter do to some interrupting factor, (5) only contemplated but notyet performed, (6) only recommended but not yet performed, (7) orderedbut not yet performed, (8) findings to be ruled out, (9)actions/findings relevant to individuals other than the patient. Anensemble of one or more of all of these separate detection mechanisms isapplied to the data and the marked contexts are excluded from subsequentextraction of actions or findings.

Step E in FIG. 1 concerns determining the catheterization insertionpoint(s) and the directional approach. The determination of correctbilling codes and other medical information extraction applicationsrequires the determination of the catheterization insertion point(s) andthe directional approach (e.g. retrograde). A specialized mechanism isdeveloped to detect catheter insertion points and approach directionsutilizing (1) a set of hand-crafted regular expression machines ofconfident phrasal descriptions of insertion point and approach (e.g.“the right groin area was prepped and . . . was inserted into thefemoral artery”), and (2) a database of partial phrasal descriptions ofinsertion point and approach information and correlation statisticsbetween these features and the true insertion point in annotatedtraining data, utilized in a standard Bayesian classification framework.In the absence of explicit information about insertion point anddirectional approach, the “standard approach” for the procedure inquestion is utilized as the default.

Step F in FIG. 1 concerns determining the sequence and path of vesselscatheterized during the procedure. The determination of correct billingcodes and other medical information extraction applications requires thedetermination of the sequence and path of vessels catheterized duringthe vascular IR procedure. To accomplish this, a database is created ofvessel branching topology for the entire human arterial and venoussystem (e.g. RIGHT_EXTERNAL_CAROTID arises from RIGHT_COMMON_CAROTID).Utilizing these anatomical constraints, the sequence of vesselscatheterized is determined by first extracting those vessels detected instep (B) and also appearing in a text region labeled in step (A) as CATH(catheterization descriptions) and not excluded by thenegation/non-relevance detection module in step (D). The relevantlaterality of each detected vessel is also incorporated from step (C).The resulting latereralized and filtered vessel lists are then sequencedby a mechanism incorporating (1) the order in which thesecatheterization region (CATH) descriptions of vessels were dictated inthe report; (2) regular expression machines that match linguistic termsand syntactic patterns indicating temporal or physical catheterizationsequence (e.g. “after X, Y was performed”); and (3) the anatomicalconstraints described above (catheterization of theRIGHT_EXTERNAL_CAROTID must first traverse the RIGHT_COMMON_CAROTID).The location/sequence of any therapies (such as angioplasties) areequivalently contextualized and sequenced using the same mechanism.

Step G in FIG. 1 concerns detecting abnormal vascular anatomy andadjusting the analysis appropriately. The determination of correctbilling codes and other medical information extraction applicationsrequires that abnormal vascular anatomy be detected and applicableanalysis be adjusted accordingly. The classic (and relatively common)instance of abnormal vascular anatomy is the “bovine arch”, where theleft common carotid artery branches from the brachiocephalic arteryrather than directly from the aorta, effecting the number of branchestraversed in the catheterization process and hence the correspondingcomplexity code. Specialized regular expression machines are created todetect the numerous variant ways of referring to this anatomicalabnormality (e.g. “the arch was bovined”) and the anatomical attachmenttables utilized in mechanism (F) are adjusted accordingly. For thepurposes of data representation, LEFT_INTERNAL_CAROTID#BOVINE is treatedas a distinct vessel with its own appropriate vascular branching andattachment topology in step (F). Other anatomical abnormalities invascular attachment are detected and handled via the same mechanism.

Step H in FIG. 1 concerns determining the physical areas imaged andinterpreted in the procedure, the view details and the relevantsequencing of these images. The determination of correct billing codesand other medical information extraction applications requires thedetection and classification of physical areas imaged and interpreted inthe procedure, the view details and the relevant sequencing of theseimages. A mechanism was created to accomplish this using a mechanismanalogous to that used for tracing the catheterization of vascularstructures. Those vessels detected in step (B) and discussed in regionlabeled as S&I in step (A) and not excluded by thenegation/non-relevance detection module in step (D) are considered. Therelevant laterality of each detected vessel is also incorporated fromstep (C). Relevant view details and sequencing information are alsoextracted using a mechanism based on step (F). The resultinglatereralized and filtered vessel lists are then further condensed bymerging those vessels independently detected in S&I sections with bothleft and right laterality by a joint BILATERAL entry for the vessel.Furthermore, general region features such as BILATERAL_LOWER_EXTREMITIESare then activated by an association table of vessel membership by area.This candidate set of imaged and interpreted vessels are entered intothe vascular IR workspace for subsequent coding and/or informationextraction.

Step I in FIG. 1 concerns determining the appropriate billing codes forthe procedure based on the rules of vascular IR coding. Once step (F)has determined the sequence of lateralized vessels that have beencatheterized in the procedure, and the insertion point/approach (e.g.RIGHT_FEMORAL) has been detected in step (E), the correspondingcandidate procedure code is determined by looking up the tuple of<vessel,laterality,approach>=code in a pre-computed table. Likewise, theS&I imaging codes are determined by looking up the<vessel,laterality>triples and <vessel_region,laterality>tuples in anequivalent table. S&I coding is generally independent of thecatheterization approach.

Step J in FIG. 1 concerns ordering and filtering the identified billingcodes based on the catheterization paths employed in the procedure.Medical procedure coding rules require that certain vessels not be codedif they are on the catheterization pathway of more distant vessels thatare also coded. Based on the vessel catheterization sequence determinedin step (F), and an ON_PATH (vessel1 vessel2) table, those intermediatevessel catheterization descriptions not allowed to be coded undermedical rules are marked with an ON PATH filter code in the vascular IRworkspace.

Step K in FIG. 1 concerns adjusting the billing codes as appropriatebased on the medical order of the procedure. Likewise, when multiplemost-distant vessels (not filtered in step) are catheterized in the samevascular “family” (i.e. sharing the same branch from the aorta or venacava), the billing codes for catheterization occurrences of additionalvessels in the same family are assigned a different billing code. Todetermine, the circumstances in which this adjustment should occur, afunction FAMILY (vessel, laterality) is computed and when FAMILY(vessel_i, laterality—i)=FAMILY (vessel_j, laterality_j) for somepreviously coded vessel the billing, code for vessel_j is adjusted asper medical coding rules. Similar applications of code transformationsbased on other detailed medical coding rules are employed asappropriate.

Step L in FIG. 1 concerns filtering identified billing codes that arenot appropriate to bill based on the medical details of the procedure.Medical coding rules also require that certain S&I imaging andinterpretation codes only be coded when the vessel(s) in question havebeen selectively catheterized (i.e. the catheter inserted directly intothe vessel as opposed to having dye injected into the vessel from adistance without direct catheterization). The candidate S&I codesdetermined in step (H) are compared with those selectively catheterizedas detected in step (F) and those matching S&I entries are marked in theworkspace being selectively catheterized. A regular-expression detectionof an explicit linguistic description of selective catheterization ofthe vessel is also used to mark selective catheterization in theworkspace. Those S&I codes appearing in a table requiring selectivecatheterization and not marked in the workspace as being selectivelycatheterized are filtered with anot-selectively-catheterized-when-required annotation and these codesare not generated for billing purposes. Similar applications of detailedmedical logic rules are utilized to filter inappropriate billing codesas appropriate.

Step M in FIG. 1 concerns linking anatomical terms seen in the procedureto billing codes, and to a graphical depiction of the procedure to aidin coding accuracy. At all steps (A) through (L) where textual patternsare detected, the starting and ending character offsets of the matchingtext are recorded along with their associated workspace entry. Thus anygenerated billing code in the workspace is associated with both thenormalized vessel name, laterality, etc. motivating the code and ahighlightable pointer to the start/ending position of the originalmotivating text. Likewise, the associated normalized vessel name andlaterality stored in the workspace are also linked directly toanatomical regions in the GUI, allowing for the highlighting of thecatheterized and imaged vessels as desired, as described below.

Step N in FIG. 1 concerns identifying and classifying other vascular IRprocedures and therapies. Other vascular IR procedures such asangioplasties and embolizations are handled using essentially the samesequence of mechanisms (A) through (M) as used for the default angiogramapplication described above. The major difference is that if the ANGP orEMBOL or other THER description regions are detected in step (A), thedetected vessels and other vascular-IR-relevant information identifiedin step (B) is looked up in a table of angioplasties and embolizations,etc. for mapping to the candidate procedure codes in step (I) above.Another difference is that depending on the other therapy type,additional features (such as use of coil, etc.) are added to the codelookup hash table, equivalent to the addition of the laterality featureto the vessel in step (I). The treatment of the S&I codes is essentiallythe same as described above, although the candidate set of acceptablematching S&I codes for these additional therapies is often significantlymore constrained than in the default angiogram case, and serve as adefault S&Icode in the case of ambiguity.

Step O in FIG. 1 concerns identifying procedures which may not be partof a vascular IR procedure, but which may have occurred during thephysician's examination and which would need to be reported to a medicalbilling application. Additional non-vascular IR procedures are handledin one of two ways. Additional procedures commonly found in vascular IRnotes, such as conscious sedation, are treated in the same mechanism asdescribed in section (N). Alternately, non-vascular IR procedures can bepassed through to a high-coverage diagnostic radiology engine foranalysis.

Step P in FIG. 1 concerns identifying medical diagnosis codes (ICD-9)associated with the procedures that took place. “Parsing” of medicalfacts in the chart leads to the creation of a set of features associatedwith each identified piece of diagnosis evidence; we can represent thisinformation as a feature vector Fi, associated with an ICD code i. Forexample, given a chunk of text evidence e such as like “history of leftsided pains in the abdomen”, Fi might specify t=HISTORY (“type”, e.g.history, followup, rule-out, etc.) m=LEFT_SIDED (“modifier” of bodypart)h=ABDOMEN (linguistic “head” of bodypart) w=PAIN (“what”, i.e symptom ordiagnosis) reg=CLIN_HIST (chart region where this evidence was found)rule=R123 (rule used to assign code i based on this evidence), etc.

Note that feature values are not, strictly, words, even though theyoften look like words. Above we use uppercase to make the distinctionclear—for example, PAIN is a more general feature value that can resultfrom multiple alternative linguistic expressions. (e.g. both “pain” and“pains”). The confidence associated with an ICD code is an estimate ofPr(icd=i|Fi); that is, the conditional probability that, given thiscombination of features, the ICD code associated with this evidence is‘i.’

The “ICD training” process analyzes charts that have been automaticallycoded and reviewed by human coders in order to identify combinations(i,e,Fi), where i is the ICD code assigned by the human coder, e is apiece of language evidence, and Fi is vector of features resulting fromNLP analysis. Since the human coders typically do not identify eexplicitly (they add, delete, or change codes without pointing out wherethe evidence came from), obtaining the frequencies needed to estimatePr(icd==i|Fi) requires an alignment process that takes explicitadvantage of the fact that human coders in the workflow are interactingwith the chart. One can view the set of coder actions as producing a setof events <action,i′,(i,e,Fi)>where i′ is the post-review code andaction is what the human coder did to get it there. These include:<CONFIRM, i, (i,e,Fi)> <DELETE, NULL, (i,e, Fi)> <CHANGE,i′, (i,e,Fi)><ADD, i′, NULL>.

Analysis of these tupies leads to a table of raw frequencies; e.g. aCONFIRM for i,Fi leads us to increment the frequency of (i,Fi). Theseraw frequencies are converted to conditional probabilities Pr(i|fi)using standard probability estimation techniques.

Step Q of FIG. 1 concerns passing an intermediate data structure to theuser interface identifying specific locations in the medical reportwhere facts were located as offsets of text to drive text highlightingin the user interface.

Step R in FIG. 1 concerns identifying the confidence in the billingcodes generated by the NLP processing for the text examined, andreporting that confidence to the human coder.

As mentioned earlier, the prediction engine of the present inventionmakes predictions (e.g. of procedure and diagnosis codes) for a chartbased on the metadata and language information present in the chart. Inmany predictive settings, the computational system not only makes itsbest prediction, but also reports or adjusts its behavior based on theextent to which its prediction can be considered reliable. This is adifferent problem from the prediction itself. Prediction can be viewedas seeking the choice C such that Pr(choice=C|evidence) is maximized,i.e. the best choice among alternatives given the evidence, whereas thereliability of a prediction abstractly concerns the probabilityPr(choice C is correct|evidence). i.e. the probability that theyes-or-no answer to “Is C the right choice?” is yes. Methods forassessing that probability can be grouped together as “confidenceassessment” techniques and they play a role in many predictive settings.

One of the features of a preferred embodiment of the present invention,is a novel approach to confidence assessment that embeds the confidenceassessment within a human-in-the-loop workflow that specificallyinvolves human approval or lack of approval. In other words, a preferredaspect of the present invention assesses specifically the likelihoodthat a human coder will push the “approve” button in response to theengine's coding. This advantageously provides integration with anongoing production process, versus confidence assessment techniques thatuse a (typically static) “gold standard” for correctness, i.e. a set ofdata held out from predictive-model estimation that is used to create aconfidence model.

Another preferred feature of one embodiment of the present invention, isa system and method for combining code-level confidence assessments, inorder to make a chart-level assessment. Yet another preferred feature ofthe present invention, is using the above-mentioned chart-levelconfidence to route notes to different queues in order to optimize thehuman coder's experience, and optimize efficiency gains in thecomputer-assisted coding process.

The engine produces CPT (procedure) and ICD (diagnosis) codes, and everyICD code is linked to a single CPT code. (Often a single CPT code willhave multiple ICD codes associated with it, but not vice versa.) We cantherefore view the engine's predicted codes as defining a set of vectors(c,i,Fc,Fi,V), where c is the CPT code, i is the ICD code. Fe is avector of evidence used in selecting the CPT code (i.e. maximizingPr(c|Fc)), Fi is a vector of evidence used in selecting the ICD codemaximizing Pr(i|Fi)), and Vis a vector of other information available in(or created for) the chart of the particular (i,c) pair. A preferredconfidence assessment technique involves the following steps: 1.Training; 2. Runtime code-level confidence assessment; 3. Runtimechart-level confidence assessment; and 4. Confidence-based routing toqueues.

The workflow of one embodiment of the presently preferred inventionincludes human coders who review notes and either approve them withoutchanges, or modify the codes. Human action may also cause dictation tomodified by the physician when the coder identifies bad language theycan ask the physician for an addendum to clarify their document. It willbe understood that human coders can be located locally or remotely fromthe coding entity. The use of coder feedback from within the workflow isone of the advantages of the presently preferred invention. In the casewhere a human coder approves the notes, approval that every reportedvector (c,i,Fc,Fi,V) involves a correct code pair (c,i), and thereforethat approval action can be viewed as producing a “labeled” pair<z(c,i,Fc,Fi,V),1>, where a label of I indicates “correct”. Conversely,if the note was not approved without modification, then—regardless ofthe specific changes, the action can be viewed as producing a “labeled”pair <(qc,i,Fc,Fi,V),0>, where a label of 0 indicates “incorrect”. Givena data set containing <(c,i,Fc,Fi,V).label> items, a binary classifieris trained. Any supervised binary classification technique can beused—e.g. decision trees, support vector machines, etc.

The separation in the workspace between medical facts and Codesdetermined based on those facts, may be used to differentially collectfeedback from user-corrections in the GUI about themedical-fact-extractor and the fact-to-code mapper. These two feedbackdata streams may be used to separately train and/or refine twosub-systems: (1) the medical-fact-extractor; and (2) the fact-to-codemapper. This separation of users' feedback facilitates identifying thecomponent system that requires change, and how it is to be changed.

When a user makes changes to the engine derived codes, these changes arepassed back through the engine logic for VasIR coding through arecalculating (“recalc”) process. This feature allows reusing the engineVasIR coding rules to verify that the human coder's rules are in theproper order.

Furthermore, the learning process feeds back user corrections to theengine in a very granular way, specifically adjusting how documents areparsed for medical facts, as well as adjusting the specific code mappingthat is part of VasIR billing rules.

In one embodiment of the present invention, a decision tree classifieris built. Each individual data feed has its own classifier, so thatconfidence assessment comprises modeling the specific approval decisionsassociated with that customer and site. There are a large number offeatures in Fe, Fi, and V. Examples include linguistic features in thechart evidence. For example, one feature indicates that the diagnosisterm evidence was headed by the term “fracture”; another,engine-determine feature, reflects that an ICD for “ankle fracture”superseded by another diagnosis, such as “ankle pain”, in the logic formost-certain coding.

The binary classifier computes a confidence in the range [O,1] for eachinput <(c,i,Fc,Fi,V)> corresponding to reported CPT and ICD codes of cand i, respectively. This can be interpreted asPr(label=1|<(c,i,Fe,Ci,V)), i.e., the probability that this CPT and ICDcombination is correct.

In order to combine evidence about multiple code reports (c,i), theminimum value is taken over Pr(label=1|<(c,i,Fc,Fi,V)) as the confidencevalue for the entire chart. Since compliance is an important focus ofthe present invention, a conservative strategy is employed byconsidering the entire set of codes for a chart only as strong as theweakest link. In one embodiment of the present invention, every codereported in the chart must pass a confidence threshold in order for thewhole chart to pass that threshold.

The confidence assessment score has an intuitively clear interpretation.It is not just a heuristic value; rather, it can be viewed as aconservative estimate (because of the “weakest link” property) that ahuman coder (in the specific workflow associated with this customersite) would accept the entire chart as correct without making anychanges.

Another embodiment of the present invention uses the confidenceassessment values to determine whether or not a chart is consideredcorrect as coded (and therefore appropriate for the “Confident” queue),or whether there is some element (c,i) that fails to meet the confidencethreshold and requires human review (in which case the chart belongs inthe “Review” queue). The confidence threshold may be set manually inconcert with the customer. Typical threshold values (>95%) are muchhigher than typical levels of inter-coder agreement even betweenexperienced human coders.

In addition to the modeling described above, the presently preferredinvention may also use a rule-based approach to determine whether achart should be treated as a note to be coded from scratch—e.g., if acrucial region like the impressions section is missing, or if the enginedetected procedure or diagnosis evidence that it deemed not codable.These rules will route notes to the “Code” queue rather than the“Review” queue. However, it should be noted that many notes in the“Code” queue do not wind up requiring coding from scratch: often theengine may have found and reported correct codes that are kept on reviewby the human coder, even though the rules have caught a serious issuethat causes the engine to flag the entire chart as one that needscareful human attention. This underscores the focus on conservativeapproaches to ensure compliance.

Another feature of one embodiment of the present invention, is providinga mechanism for outputting the resulting codes from the NLP process to aGUI user interface optimized for a human coder to review the results andmake changes as appropriate. The NLP-based coding engine does not alwaysproduce perfect results. However, the accuracy and consistency of theresults may be greatly improved through the correction of the results byhuman coders. The interaction between the NLP-based engine and humancoders is an activity in which the person and the machine collaborate onachieving a correct coding outcome. In one embodiment of the presentinvention, this interaction takes place through a custom GUI interfacedesigned to display textual and graphical information that can beapproved or disapproved by a human coder. The end result of thatcollaboration may be used for training the engine to consistentlyproduce more accurate results.

The NLP-based coding engine can be broken out into two components, eachwith separate characteristics: (1) NLP concept and context extractor:scan text, identify medical facts; and (2) fact-to-code mapper (in thecase of VasIR its blood vessels, laterality, procedures, entry points,i.e. “approaches”). In the case of VasIR it is an algorithmic mappingfrom fact extracted from component (1) to CPT codes. Component (2) isbuilt in a way that completely covers all the logic for a particularmedical coding domain (e.g., vascular IR, or Evaluation and Management).

FIG. 2 shows how for each ICD code 10 found by the NLP engine, both thecode and the description 11 may be displayed for the user. Procedurecodes may be presented in order of “type” (e.g., surgical, S&I,non-billed) and in order of relative value unit largest to smallest.Coders may enter additional ICD codes in input box 13. When they clickthe “Add ICD Code” button 14, the description for the code added ispulled from the database. ICD codes are associated with CPT codes bychecking check boxes 15 that precede the ICD code on right-most columnin the CPT table. The approach 16 for the procedure is displayed abovethe CPT table. Engine supplied CPT codes can be changed by the user bydouble-clicking on the CPT code. A warning message, however, (not shownin FIG. 2) may be displayed to indicate the override will circumvent theengine before the user is allowed to edit the code. Preferably, once thecoder has accepted the warning message, the coder is not interruptedagain for other changes.

FIG. 3 shows an interface for checking the validity of the codesentered. An error message 20 is displaced in the event of a conflictsbetween the NLP engine's result and the human coder's result. Theinterface has built in validations to prevent a coder from approving areport that is either incomplete, or that violates certain or all codingrules Row L2 in FIG. 3 shows how errors and warnings are highlighted tothe user. Validity messages may include a visual indication 21 in theform of an icon (red X for errors, yellow ! for warnings), the rowidentifier in either the ICD or CPT table, and a detailed description ofthe issue the coder must resolve, or at least be aware of the primacyvalidation ensures the ICD code in the primary location correct for allprocedures.

FIG. 4 demonstrates a highlighting feature of the interface. Thismechanism shows the specific text 30 that identifies the procedure codein the report by highlighting it and automatically scrolling the reportto the exact location of the text or user review. To assist the coder inquickly visualizing the language in the report that the NLP engineidentified for each procedure, when a user clicks on a row (for example,row L2 in FIG. 4) in the procedure table, the text in the report ishighlighted. If the highlighted text is not visible in the reportwindow, the interface may automatically adjust the window to display thehighlighted text. Similarly, as shown in FIG. 4, vessels 31 in thegraphic are highlighted and the view is centered on the highlightedvessel(s). Preferably, the interface also provides an integrated encoderfunction that allows the user to reference medical codes and theirmeaning. It should also be noted that the coloring chosen forhighlighting are ADA compliant, allowing visually impaired individualsto understand displayed information.

Another preferred feature of the present invention, is a mechanism forthe NLP engine and Coiling-Workspace based GUI to interact. When amedical report is processed by the system, the NLP engine runs in itsentirety (see FIG. 1). This includes the medical fact extractor (steps Athrough H) and the subsequent fact-to-code mapper (steps I through L)The engine-extracted medical facts are presented in the graphical userinterface (GUI) in the left-side of the coding workspace, and thecorresponding codes are presented on the right side. The graphicalinterface allows zooming, panning, and side views of the images. Theuser can then change, add, or remove medical facts. The user can click arecalculate button that runs the latest set of medical facts through thefact-to-code component of the NLP engine, and updates the codes. Theprocess of changing medical facts and recalculating can then berepeated.

Yet another preferred feature of the present invention is a learningfeedback loop that allows changes made by the human coder to beidentified specifically and sent back to the NLP engine. A human coderusing a mouse can select text missed by the engine, identify the anatomyfeature in the vascular path, and paste the resulting procedure codeinto the medical billing results. The text, location, and procedure codeare sent back to the NLP engine server to allow the missed language tobe learned by the technology and appropriate adjustments made to theengine.

To add a procedure or vessel in the VasIR interface, the coder mustprovide text evidence and provide the type of procedure, side, andselect a valid vessel or procedure. FIG. 5A shows the dialog 40 thatappears after the language evidence is highlighted within the report.

As demonstrated in FIGS. 5B and 5C, to assist the coder in finding thecorrect vessel or procedure, as the coder types in the Procedure orVessel name in input field 50, a valid list of selection 51 is displayedthat matches what the user has typed. This is an active search againstthe database of valid procedures and vessels, thus as the user continuesto type in input field 50, the list of possible matches is reduced untilthe user selects the correct value by highlighting it in the list ofselection 51 and clicking the mouse button. FIGS. 5B and 5C show anexample of a user typing “Illiac” to find the “External Illiac,” whichmatches the highlighted text evidence.

As shown in FIGS. 6A, 6B and 6C, a graphical anatomy reference tooldisplays the, vascular anatomy to the human coder, providing a referenceto the human coder consisting of a complete vascular diagram of apatient. Vascular diagram 60 adjusts appropriately to the gender of thepatient (there are anatomical differences between genders in thevasculature) as well as common aortic anomalies (such as the “bovinearch”). The path of the catheter as reported in the results ishighlighted in the area designated 61 in the graphical anatomy display.Also, highlighted in the area designated 61 is any missed anatomy ineither the report or the coding (engine or human).

As demonstrated in FIG. 4. in one embodiment of the present invention,the interactive, graphical diagram of the vessel anatomy used in theVascular IR interface translates NLP observations into visual pathsrepresenting the path of the catheter. The visual path is set against agray-scale, static canvas which provides a background of all thevessels, each of which are assigned a label 32 specifying the commonname of the vessel. Upon the loading of the coding information for agiven note, those vessels which have been identified by the engine orthe coder are “activated” in red. To accomplish this, a vessel segmentgraphic with the red coloration is overlaid on the background at thecoordinates in an XML database containing all vessel segmentinformation.

The NLP engine provides a series of line items which correspond toobservations in the note. In most cases, the observations are associatedwith a single vessel or vessel pair (in the case of bilateral). Thevessel information is provided using an internal format that identifiesthe laterality and a terse vessel id (known as the NLP Id). The VasIRinterface maps these identifiers to the vessel image file. Additionally,the mapping provides the coordinates of the image segment relative tothe canvas. When the diagram is loaded, these images segments areoverlaid on the canvas, thus hiding the corresponding gray-scaleportion. The mapping also provides the coordinates of the label for thevessel segment so that medical codes (ICD and CPT) relevant to each canbe displayed for reference purposes.

Each vessel has one of the following nodes associated with it. The“name” element is the label used on the diagram. The “resourceId”element is the basename of the image file to be placed on the canvas.The “nlpId.” is the identifier provided by the NLP engine to represent avessel observation. The “type” attribute future proofs the database toallow for other vessels types such a veins, stents or grafts. Thelaterality is one of left, right or blank for unilateral vessels.Finally, the “x”. “v”, “labelX”, “labelY” are the coordinates of thevessel segment and the vessel's CPT code placement elative to thecanvas.

In order to provide a rich user experience, the information from thisdatabase is loaded into memory and an object representing each node ispropagated to the UI in the form of JavaScript data structures. It isthe JavaScript that responds to user interactions, placing segments onthe canvas, highlighting the selected vessel in gold, and allowing thediagram to be panned while keeping the vessels in place relative to thecanvas.

In another embodiment, the present invention uses a database containinganatomical phrases used to described anatomy, the procedure codes theyrefer to, their relationship to each other (the chains of possiblevascular paths), and ties them to the above-described graphical diagramwhich can be rendered interactively for the human coder as a referencetool.

In yet another embodiment of the present invention, the database anduser interface is made available to users through the Internet usingHTTP/HTML browser supporting JavaScript and Java. In another embodimentof the present invention, wireless devices, such as cell phones and PDAscan be used to access the system described herein. This allows deliveryof the functionality to the users without any installation or locallyinstalled components, facilitating software updates without any actionby the user.

A number of embodiments of the present invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention. Forexample, various implementations may change the order in whichoperations are performed, for instance, the order of the steps in FIG. 1may be changed. Therefore, depending on the needs and preferences of animplementation, particular operations may be implemented as a combinedoperation, eliminated, added to, or otherwise rearranged. Accordingly,other embodiments are within the scope of the following claims.

What is claimed:
 1. A method comprising: receiving, from a remote systemvia a network connection, a first medical report containing naturallanguage; statistically analyzing, using a computer, a set of trainingmedical reports having pre-annotated sentence elements, wherein eachsentence element is a sentence or a phrase; calculating, using thecomputer, a conditional probability for each sentence element of the setof training medical reports, the conditional probability indicating theprobability that the sentence element belongs to a specific functionalregion in the set of training medical reports; calculating, using thecomputer and the conditional probability for each sentence element ofthe set of training medical reports, a run-time probability for eachsentence element of the first medical report, the rim-time probabilityindicating the probability that the sentence element belongs to aspecific functional region in the first medical report; and assigning,using the computer and the run-time probability, each sentence elementof the first medical report to the corresponding specific functionalregion in the first medical report.
 2. The method of claim 1, furthercomprising: extracting, by the computer, at least one language-basedfeature from the functional regions of the first medical report;identifying, via the computer, at least one standardized code associatedwith the extracted at least one language-based feature; generating, viathe computer, a confidence assessment for the at least one standardizedcode, wherein the generated confidence assessment indicates thelikelihood that the identified standardized code is correct givenevidence including the at least one extracted language-based feature;and generating a medical report-level confidence assessment based on theconfidence assessment associated with the at least one standardizedcode.
 3. The method of claim 1, wherein the first medical reportcomprises a set of medical reports.
 4. The method of claim 1, whereinthe first medical report comprises information relating to one or moremedical reports, the information being obtained from multiple remotedata feeds and merged to create the first medical report.
 5. The methodof claim 1, where the step of calculating the run-time probability foreach sentence element of the first medical report uses a Markov model.6. The method of claim 2, further comprising routing the first medicalreport to one of a plurality of queues based on the report-levelconfidence assessment.
 7. The method of claim 1, further comprisingcalculating a regional transition probability between adjacent sentenceelements in the set of training, medical reports, and using the regionaltransition probability in the step of calculating the run-timeprobability for each sentence element of the first medical report.
 8. Asystem comprising: one or more first computers, each of the firstcomputers being configured perform a regioning process, the regioningprocess assigning each sentence element of a first medical reportcontaining natural language to a corresponding functional region in thefirst medical report by: receiving, by the first computers from a remotesystem via a network connection, the first medical report; statisticallyanalyzing, using the first computers, a set of training medical reportshaving pre-annotated sentence elements, wherein each sentence element isa sentence or a phrase; calculating, using the first computers, aconditional probability for each sentence element of the set of trainingmedical reports, the conditional probability indicating the probabilitythat the sentence element belongs to a specific functional region in theset of training medical reports; calculating, using the first computersand the conditional probability for each sentence element of the set oftraining medical reports, a run-time probability for each sentenceelement of the first medical report, the run-time probability indicatingthe probability that the sentence element belongs to a specificfunctional region in the first medical report; and assigning, using thecomputer and the run-time probability, each sentence element of thefirst, medical report to the corresponding specific functional region inthe first medical report.
 9. The system of claim 8, wherein each of thefirst computers is further configured to identify a set of one or morestandardized codes and generate a confidence assessment for theidentified set of standardized codes by: extracting, by the computer, atleast one language-based feature from the functional regions of thefirst medical report; identifying, via the computer, one or morestandardized codes associated with the extracted at least onelanguage-based feature; and generating, via the computer, a confidenceassessment for the one or more standardized codes, wherein the generatedconfidence assessment indicates the likelihood that the one or morestandardized codes are correct given evidence including the at least oneextracted language-based feature.
 10. The system of claim 8, wherein thefirst medical report comprises a set of medical reports.
 11. The systemof claim 8, wherein the first medical report comprises informationrelating to one or more medical reports, the information being obtainedfrom multiple remote data feeds and merged to create the first medicalreport.
 12. The system of claim 8, where the step of calculating therun-time probability for each sentence element of the first medicalreport uses a Markov model.
 13. The system of claim 9, furthercomprising a second computer, the second computer configured to performa learning process by: receiving the one or more standardized codesidentified by the one or more first computers and the confidenceassessment generated by the one or more first computers for the one ormore standardized codes; displaying, via a graphical user interface, theone or more standardized codes to a human coder; receiving, for the oneor more standardized codes, an indication as to whether a human coderapproved or modified the one or more standardized codes; adjusting theconfidence assessment generating step based on the received indicationsas to whether a human coder approved or modified the one or morestandardized codes selecting one of a plurality of queues based on thegenerated confidence assessment; and routing the medical report to theselected queue.
 14. The system of claim 13, wherein the second computeris different from any of the one or more first computers.
 15. The systemof claim 8, wherein each of the first computers is further configured tocalculate a regional transition probability between adjacent sentenceelements in the set of training medical reports, and use the regionaltransition probability in the step of calculating the run-timeprobability for each sentence element of the first medical report.